bimodal distribution of reward

traditional reinforcement learning

converge to mean

converge to median

distributional reinforcement learning

converge to quantile

converge to expectile

asymmetric scaling factor